Quarkus 你健康了嗎 ?

2024 iThome 鐵人賽

DAY 5

Kubernetes

當 Quarkus 想要騎乘駱駝並用8腳章魚掌控舵手系列第 5 篇

16th鐵人賽 kubernetes

chichi

2024-08-27 09:20:48

208 瀏覽

分享至

Kubernetes 和微服務的整合讓開發人員還有環境產生了一些變化。以前也許是單體的服務現在可能是切分了數幾個微服務，可以想像，服務切分的多則故障機率相對會提升。

先前單體式的服務也許會使用一個 Loadbalance 服務來面相用戶端，背後使用多個單體服務實現高可用。對於處理失敗的應用程式是一個手動的過程，管理員須進行根本原因分析來找到原因並加以解決，這樣就不必在以後的時間裡再次花費時間處理相同問題。過程中唯一的自動化是負載均衡器，可識別 HTTP 500 錯誤並禁止轉發流量至故障應用程式。負載均衡器也許是透過偶爾發送流量來檢測已恢復的伺服器並恢復流量。

而在 Kubernetes 架構中，Kubernetes 提高對服務的可用性、擴展性和自動化功能。為了讓服務達成這些功能，Kubernetes 提供了探針，即 Kubernetes 為 Pod 中的容器設置的一系列健康檢查，這些檢查能幫助 Kubernetes 了解容器是否處於健康狀態，並根據檢查結果做出相應的動作，例如：

重啟失敗的容器
將流量轉發給健康的容器
阻止流量被轉發到故障的 Pod

Kubernetes 提供了以下探針

livenessProbe

檢查容器是否還活著，也就是說，容器中的應用程式是否還在運行，並且能夠響應請求。如果 Liveness Probe 連續多次失敗，Kubernetes 將認為該容器已經死亡，kubelet 將重啟該容器。

readinessProbe

檢查容器是否準備好開始接受流量。如果 Readiness Probe 失敗，Kubernetes 將該 Pod 從 Service 資源的 Endpoint 中移除，直到探針成功。這樣避免了流量導到故障的 Pod，這過程也不須人為介入。

這種探針在等待應用程式執行耗時的初始任務時非常有用，例如載入預設檔案或一些快取。

startupProbe

檢查應用程式是否已經啟動成功。這類探針僅在啟動時執行，不像就緒探針(Readiness Probe)那樣週期性觸發。如果失敗，kubelet 將殺死容器

Kubernetes 針對這些探針有很多配置，下表顯示基本的配置內容。

Kubernetes probe parameter	Description
initialDelaySeconds	開始探測之前等待的時間
periodSeconds	探測間隔時間
timeoutSeconds	等待探測完成的時間
successThreshold	在失敗後被視為成功的最小連續成功探測
failureThreshold	在放棄之前重試 failureThreshold 次。放棄 liveness 探測將重新啟動容器。放棄 readiness 探測將暫停到容器的流量

探針是由 kubelet 對容器執行週期診斷。但要如何診斷 ? kubelet 可以透過在容器內執行腳本(script)，也可以發出網路請求等，可以由以下機制進行觸發

exec
- 在容器內執行下達的命令。如果命令退出時回傳碼為 0 則認為成功
grpc
- 使用 gRPC 執行一個遠端調用
httpGet
- 對容器的 IP 位址上指定 Port 和路徑執行 HTTP GET 請求
tcpSocket
- 對容器的 IP 位址上的指定 Port 執行 TCP 檢查

在探針執行後，會有以下結果

Success
- 診斷成功
Failure
- 診斷未成功
Unknown
- 診斷失敗，不會採取任何行動

在 Kubernetes 中可以使用 explain 方式詳細看每個欄位的定義，可以看到探針屬於 container 層級。這邊只列出 livenessProbe 其它 readinessProbe 與 startupProbe 讀者可以自行操作探討。

$ kubectl explain pods.spec.containers.livenessProbe
KIND:       Pod
VERSION:    v1

FIELD: livenessProbe <Probe>


DESCRIPTION:
    Periodic probe of container liveness. Container will be restarted if the
    probe fails. Cannot be updated. More info:
    https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
    Probe describes a health check to be performed against a container to
    determine whether it is alive or ready to receive traffic.

FIELDS:
  exec  <ExecAction>
    Exec specifies the action to take.

  failureThreshold      <integer>
    Minimum consecutive failures for the probe to be considered failed after
    having succeeded. Defaults to 3. Minimum value is 1.

  grpc  <GRPCAction>
    GRPC specifies an action involving a GRPC port.

  httpGet       <HTTPGetAction>
    HTTPGet specifies the http request to perform.

  initialDelaySeconds   <integer>
    Number of seconds after the container has started before liveness probes are
    initiated. More info:
    https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

  periodSeconds <integer>
    How often (in seconds) to perform the probe. Default to 10 seconds. Minimum
    value is 1.

  successThreshold      <integer>
    Minimum consecutive successes for the probe to be considered successful
    after having failed. Defaults to 1. Must be 1 for liveness and startup.
    Minimum value is 1.

  tcpSocket     <TCPSocketAction>
    TCPSocket specifies an action involving a TCP port.

  terminationGracePeriodSeconds <integer>
    Optional duration in seconds the pod needs to terminate gracefully upon
    probe failure. The grace period is the duration in seconds after the
    processes running in the pod are sent a termination signal and the time when
    the processes are forcibly halted with a kill signal. Set this value longer
    than the expected cleanup time for your process. If this value is nil, the
    pod's terminationGracePeriodSeconds will be used. Otherwise, this value
    overrides the value provided by the pod spec. Value must be non-negative
    integer. The value zero indicates stop immediately via the kill signal (no
    opportunity to shut down). This is a beta field and requires enabling
    ProbeTerminationGracePeriod feature gate. Minimum value is 1.
    spec.terminationGracePeriodSeconds is used if unset.

  timeoutSeconds        <integer>
    Number of seconds after which the probe times out. Defaults to 1 second.
    Minimum value is 1. More info:
    https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

在 Quarkus 中使用 SmallRye Health 來為應用程式導出應用程式的探針。當應用程式整合了資料庫或是 Message Queue 訊息會有資料庫連線Message Queue 連線等服務狀態。Quarkus 已經做好了基本的一切，對於開發者來說是一個輕鬆整合的配置。在下個章節會透過 Quarkus 進行操作，更進一步了解探針。